Generative artificial intelligence ( Generative AI, GenAI, or GAI) is a subfield of artificial intelligence that uses to produce text, images, videos, or other forms of data. These models Machine learning the underlying patterns and structures of their training data and use them to produce new data based on the input, which often comes in the form of natural language prompts.
Generative AI tools have become more common since the AI boom in the 2020s. This boom was made possible by improvements in transformer-based deep learning neural networks, particularly large language models (LLMs). Major tools include chatbots such as ChatGPT, Copilot, Gemini, Claude, Grok, and DeepSeek; text-to-image models such as Stable Diffusion, Midjourney, and DALL-E; and text-to-video models such as Veo and Sora. Technology companies developing generative AI include OpenAI, Anthropic, Meta AI, Microsoft, Google, DeepSeek, and Baidu.
Generative AI has raised many ethical questions as it can be used for cybercrime, or to deceive or manipulate people through fake news or . Even if used ethically, it may lead to mass replacement of human jobs. The tools themselves have been criticized as violating intellectual property laws, since they are trained on copyrighted works.
Generative AI is used across many industries. Examples include software development, healthcare, finance, entertainment, customer service, sales and marketing, art, writing, fashion, and product design.
Computers were needed to go beyond Markov chains. By the early 1970s, Harold Cohen was creating and exhibiting generative AI works created by AARON, the computer program Cohen created to generate paintings.
The terms generative AI planning or generative planning were used in the 1980s and 1990s to refer to AI planning systems, especially computer-aided process planning, used to generate sequences of actions to reach a specified goal. Generative AI planning systems used symbolic AI methods such as state space search and constraint satisfaction and were a "relatively mature" technology by the early 1990s. They were used to generate crisis action plans for military use,
In 2014, advancements such as the variational autoencoder and generative adversarial network produced the first practical deep neural networks capable of learning generative models, as opposed to discriminative ones, for complex data such as images. These deep generative models were the first to output not only class labels for images but also entire images.
In 2017, the Transformer network enabled advancements in generative models compared to older Long-Short Term Memory models, leading to the first generative pre-trained transformer (GPT), known as GPT-1, in 2018. This was followed in 2019 by GPT-2, which demonstrated the ability to generalize unsupervised to many different tasks as a Foundation model.
The new generative models introduced during this period allowed for large neural networks to be trained using unsupervised learning or semi-supervised learning, rather than the supervised learning typical of discriminative models. Unsupervised learning removed the need for humans to Labeled data, allowing for larger networks to be trained.
In 2021, the emergence of DALL-E, a transformer-based pixel generative model, marked an advance in AI-generated imagery. This was followed by the releases of Midjourney and Stable Diffusion in 2022, which further democratized access to high-quality artificial intelligence art creation from natural language prompts. These systems demonstrated unprecedented capabilities in generating photorealistic images, artwork, and designs based on text descriptions, leading to widespread adoption among artists, designers, and the general public.
In late 2022, the public release of ChatGPT revolutionized the accessibility and application of generative AI for general-purpose text-based tasks. The system's ability to chatbot, AI art, assist with coding, and perform various analytical tasks captured global attention and sparked widespread discussion about AI's potential impact on work, education, and AI art.
In March 2023, GPT-4's release represented another jump in generative AI capabilities. A team from Microsoft Research controversially argued that it "could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system." However, this assessment was contested by other scholars who maintained that generative AI remained "still far from reaching the benchmark of 'general human intelligence'" as of 2023. Later in 2023, Meta Platforms released ImageBind, an AI model combining multiple modalities including text, images, video, thermal data, 3D data, audio, and motion, paving the way for more immersive generative AI applications.
In December 2023, Google unveiled Gemini, a multimodal AI model available in four versions: Ultra, Pro, Flash, and Nano. The company integrated Gemini Pro into its Bard chatbot and announced plans for "Bard Advanced" powered by the larger Gemini Ultra model. In February 2024, Google unified Bard and Duet AI under the Gemini brand, launching a mobile app on Android and integrating the service into the Google app on iOS.
In March 2024, Anthropic released the Claude 3 family of large language models, including Claude 3 Haiku, Sonnet, and Opus. The models demonstrated significant improvements in capabilities across various benchmarks, with Claude 3 Opus notably outperforming leading models from OpenAI and Google. In June 2024, Anthropic released Claude 3.5 Sonnet, which demonstrated improved performance compared to the larger Claude 3 Opus, particularly in areas such as coding, multistep workflows, and image analysis.
Asia–Pacific countries are significantly more optimistic than Western societies about generative AI and show higher adoption rates. Despite expressing concerns about privacy and the pace of change, in a 2024 survey, 68% of Asia-Pacific respondents believed that AI was having a positive impact on the world, compared to 57% globally. According to a survey by SAS Institute and Coleman Parkes Research, China in particular has emerged as a global leader in generative AI adoption, with 83% of Chinese respondents using the technology, exceeding both the global average of 54% and the U.S. rate of 65%. This leadership is further evidenced by China's intellectual property developments in the field, with a UN report revealing that Chinese entities filed over 38,000 generative AI from 2014 to 2023, substantially surpassing the United States in patent applications. A 2024 survey on the Chinese social app Soul reported that 18% of respondents born after 2000 used generative AI "almost every day", and that over 60% of respondents like or love AI-generated content, while less than 3% dislike or hate it.
Generative AI has made its appearance in a wide variety of industries, radically changing the dynamics of content creation, analysis, and delivery. In healthcare, generative AI is instrumental in accelerating drug discovery by creating molecular structures with target characteristics and generating radiology images for training diagnostic models. This extraordinary ability not only enables faster and cheaper development but also enhances medical decision-making. In finance, generative AI is invaluable as it generates datasets to train models and automates report generation with natural language summarization capabilities. It automates content creation, produces synthetic financial data, and tailors customer communications. It also powers chatbots and virtual agents. Collectively, these technologies enhance efficiency, reduce operational costs, and support data-driven decision-making in financial institutions. The media industry makes use of generative AI for numerous creative activities such as music composition, scriptwriting, video editing, and digital art. The educational sector is impacted as well, since the tools make learning personalized through creating quizzes, study aids, and essay composition. Both the teachers and the learners benefit from AI-based platforms that suit various learning patterns.
In addition to natural language text, large language models can be trained on programming language text, allowing them to generate source code for new computer programs. Examples include OpenAI Codex, Tabnine, GitHub Copilot, Microsoft Copilot, and VS Code fork Cursor.
Some AI assistants help candidates cheat during online by providing code, improvements, and explanations. Their clandestine interfaces minimize the need for eye movements that would expose cheating to the interviewer.
Generative AI systems such as MusicLM and MusicGen can also be trained on the audio waveforms of recorded music along with text annotations, in order to generate new musical samples based on text descriptions such as a calming violin melody backed by a distorted guitar riff.
of music lyrics have been generated, like the song Savages, which used AI to mimic rapper Jay-Z's vocals. Music artist's instrumentals and lyrics are copyrighted but their voices are not protected from regenerative AI yet, raising a debate about whether artists should get royalties from audio deepfakes.
Many AI music generators have been created that can be generated using a text phrase, Music genre options, and looped libraries of bars and .
Smaller generative AI models with up to a few billion parameters can run on smartphones, embedded devices, and personal computers. For example, LLaMA-7B (a version with 7 billion parameters) can run on a Raspberry Pi 4 and one version of Stable Diffusion can run on an iPhone 11.
Larger models with tens of billions of parameters can run on laptop or desktop computers. To achieve an acceptable speed, models of this size may require AI accelerator such as the GPU chips produced by NVIDIA and AMD or the Neural Engine included in Apple silicon products. For example, the 65 billion parameter version of LLaMA can be configured to run on a desktop PC.
The advantages of running generative AI locally include protection of privacy and intellectual property, and avoidance of rate limiting and censorship. The Reddit r/LocalLLaMA in particular focuses on using consumer-grade gaming through such techniques as compression. That forum is one of only two sources Andrej Karpathy trusts for language model benchmarks. Yann LeCun has advocated open-source models for their value to vertical applications and for improving AI safety.
Language models with hundreds of billions of parameters, such as GPT-4 or PaLM, typically run on datacenter computers equipped with arrays of GPUs (such as NVIDIA's H100) or AI accelerator chips (such as Google's TPU). These very large models are typically accessed as Cloud computing services over the Internet.
In 2022, the United States New Export Controls on Advanced Computing and Semiconductors to China imposed restrictions on exports to China of GPU and AI accelerator chips used for generative AI. Chips such as the NVIDIA A800 and the Biren Technology BR104 were developed to meet the requirements of the sanctions.
There is free software on the market capable of recognizing text generated by generative artificial intelligence (such as GPTZero), as well as images, audio or video coming from it. Potential mitigation strategies for detecting generative AI content include digital watermarking, Data provenance, information retrieval, and machine learning classifier models. Despite claims of accuracy, both free and paid AI text detectors have frequently produced false positives, mistakenly accusing students of submitting AI-generated work.
In the European Union, the proposed Artificial Intelligence Act includes requirements to disclose copyrighted material used to train generative AI systems, and to label any AI-generated output as such.
In China, the Interim Measures for the Management of Generative AI Services introduced by the Cyberspace Administration of China regulates any public-facing generative AI. It includes requirements to watermark generated images or videos, regulations on training data and label quality, restrictions on personal data collection, and a guideline that generative AI must "adhere to socialist core values".
Proponents of fair use training have argued that it is a transformative use and does not involve making copies of copyrighted works available to the public. Critics have argued that image generators such as Midjourney can create nearly-identical copies of some copyrighted images, and that generative AI programs compete with the content they are trained on.
As of 2024, several lawsuits related to the use of copyrighted material in training are ongoing. Getty Images has sued Stability AI over the use of its images to train Stable Diffusion. Both the Authors Guild and The New York Times have sued Microsoft and OpenAI over the use of their works to train ChatGPT.
In January 2025, the United States Copyright Office (USCO) released extensive guidance regarding the use of AI tools in the creative process, and established that "...generative AI systems also offer tools that similarly allow users to exert control. These can enable the user to control the selection and placement of individual creative elements. Whether such modifications rise to the minimum standard of originality required under Feist will depend on a case-by-case determination. In those cases where they do, the output should be copyrightable" Subsequently, the USCO registered the first visual artwork to be composed of entirely AI-generated materials, titled "A Single Piece of American Cheese".
The intersection of AI and employment concerns among underrepresented groups globally remains a critical facet. While AI promises efficiency enhancements and skill acquisition, concerns about job displacement and biased recruiting processes persist among these groups, as outlined in surveys by Fast Company. To leverage AI for a more equitable society, proactive steps encompass mitigating biases, advocating transparency, respecting privacy and consent, and embracing diverse teams and ethical considerations. Strategies involve redirecting policy emphasis on regulation, inclusive design, and education's potential for personalized teaching to maximize benefits while minimizing harms.
In July 2023, the fact-checking company Logically found that the popular generative AI models Midjourney, DALL-E 2 and Stable Diffusion would produce plausible disinformation images when prompted to do so, such as images of electoral fraud in the United States and Muslim women supporting India's Hindu nationalist Bharatiya Janata Party.
In April 2024, a paper proposed to use blockchain (distributed ledger technology) to promote "transparency, verifiability, and decentralization in AI development and usage".
Concerns and fandoms have spawned from AI-generated music. The same software used to clone voices has been used on famous musicians' voices to create songs that mimic their voices, gaining both tremendous popularity and criticism. Similar techniques have also been used to create improved quality or full-length versions of songs that have been leaked or have yet to be released.
Generative AI has also been used to create new digital artist personalities, with some of these receiving enough attention to receive record deals at major labels. The developers of these virtual artists have also faced their fair share of criticism for their personified programs, including backlash for "dehumanizing" an artform, and also creating artists which create unrealistic or immoral appeals to their audiences.
A 2023 study showed that generative AI can be vulnerable to jailbreaks, reverse psychology and prompt injection attacks, enabling attackers to obtain help with harmful requests, such as for crafting social engineering and Phishing. Additionally, other researchers have demonstrated that open-source models can be fine-tuned to remove their safety restrictions at low cost.
The carbon footprint of generative AI globally is estimated to be growing steadily, with potential annual emissions ranging from 18.21 to 245.94 million tons of CO2 by 2035, with the highest estimates for 2035 nearing the impact of the United States beef industry on emissions (currently estimated to emit 257.5 million tons annually as of 2024).
Proposed mitigation strategies include factoring potential environmental costs prior to model development or data collection, increasing efficiency of data centers to reduce electricity/energy usage, building more efficient Machine learning, minimizing the number of times that models need to be retrained, developing a government-directed framework for auditing the environmental impact of these models, regulating for transparency of these models, regulating their energy and water usage, encouraging researchers to publish data on their models' carbon footprint, and increasing the number of subject matter experts who understand both machine learning and climate science.
A paper published by researchers at Amazon Web Services AI Labs found that over 57% of sentences from a sample of over 6 billion sentences from Common Crawl, a snapshot of web pages, were machine translated. Many of these automated translations were seen as lower quality, especially for sentences that were translated across at least three languages. Many lower-resource languages (ex. Wolof language, Xhosa language) were translated across more languages than higher-resource languages (ex. English, French).
In September 2024, Robyn Speer, the author of wordfreq, an open source database that calculated word frequencies based on text from the Internet, announced that she had stopped updating the data for several reasons: high costs for obtaining data from Reddit and Twitter, excessive focus on generative AI compared to other methods in the natural language processing community, and that "generative AI has polluted the data".
The adoption of generative AI tools led to an explosion of AI-generated content across multiple domains. A study from University College London estimated that in 2023, more than 60,000 scholarly articles—over 1% of all publications—were likely written with LLM assistance. According to Stanford University's Institute for Human-Centered AI, approximately 17.5% of newly published computer science papers and 16.9% of peer review text now incorporate content generated by LLMs. Many academic disciplines have concerns about the factual reliably of academic content generated by AI.
Visual content follows a similar trend. Since the launch of DALL-E 2 in 2022, it is estimated that an average of 34 million images have been created daily. As of August 2023, more than 15 billion images had been generated using text-to-image algorithms, with 80% of these created by models based on Stable Diffusion.
If AI-generated content is included in new data crawls from the Internet for additional training of AI models, defects in the resulting models may occur. Training an AI model exclusively on the output of another AI model produces a lower-quality model. Repeating this process, where each new model is trained on the previous model's output, leads to progressive degradation and eventually results in a "model collapse" after multiple iterations. Tests have been conducted with pattern recognition of handwritten letters and with pictures of human faces. As a consequence, the value of data collected from genuine human interactions with systems may become increasingly valuable in the presence of LLM-generated content in data crawled from the Internet.
On the other side, synthetic data is often used as an alternative to data produced by real-world events. Such data can be deployed to validate mathematical models and to train machine learning models while preserving user privacy, including for structured data. The approach is not limited to text generation; image generation has been employed to train computer vision models.
In June 2024, Reuters Institute published their Digital News Report for 2024. In a survey of people in America and Europe, Reuters Institute reports that 52% and 47% respectively are uncomfortable with news produced by "mostly AI with some human oversight", and 23% and 15% respectively report being comfortable. 42% of Americans and 33% of Europeans reported that they were comfortable with news produced by "mainly human with some help from AI". The results of global surveys reported that people were more uncomfortable with news topics including politics (46%), crime (43%), and local news (37%) produced by AI than other news topics.
|
|